Decoder stateful #663

jlibovicky · 2018-03-02T10:06:28Z

This PR make autoregressive decoder instance of TemporalStateful such we can stack a sequence labeler on top of it.

varisd

Jen jedna pripominka

varisd · 2018-03-02T11:08:06Z

neuralmonkey/decoders/autoregressive.py

+            self.train_mode,
+            lambda: tf.transpose(self.train_output_states, [1, 0, 2])[:, :-2],
+            lambda: tf.transpose(
+                self.runtime_output_states, [1, 0, 2])[:, :-2])


Proc tady zahazujes ty posledni dva prvky? Mozna by to chtelo komentar

jindrahelcl

Dva komenty plus jeden request tady:

Je třeba přejmenovat encoder u labeleru a taky mu změnit typ v konstruktoru. Třeba na temporal stateful, to se asi hodí nejvíc.

jindrahelcl · 2018-03-02T12:47:02Z

neuralmonkey/decoders/autoregressive.py

+
+    @tensor
+    def temporal_states(self) -> tf.Tensor:
+        return tf.cond(


přesvědč mě prosím, že to chceme mít závislé na dekodéřím train modu.. Neni reálná situace, kdy chceme labeler trénovat na runtime stavech, případně běhat labeler s trénovacím módem dekodéru? (To druhý asi ne; nicméně pokud mam zafixovanej dekodér a trénuju si labeler, tak self.train_mode bude nejspíš true, takže se mi muj labeler nakazí exposure biasem od dekodéru, ne?)

Představ si, že chceš trénovat multitask překlad a POS tagging na výstupu. Máš POS tagy pro trénovací data. Pak přece nemůžeš nechat dekodér si generovat, co chce, aby labeler nechytil exposure bias. Musíš naopak zařídit, aby ten výstup dekodéru odpovídal tomu, co je v těch zlatých datech, protože jinak nemáš záruku, že stav enkodéru odpovídá tomu, co máš ve zlatých datech.

Kdybys chtěl trénovat ty tagy na runtime stavech, tak bys musel dokázat zjišťovat ty správný tagy dynamicky, podle toho, co se zrovna vydekódovalo, a to není možné.

A jaký slovo tim taggerem labeluješ? Předchozí vydekódovaný nebo zrovna vydekódovaný? Protože jestli zrovna vydekódovaný, tak to je taky něco jinýho než reference.

jindrahelcl · 2018-03-02T12:48:14Z

neuralmonkey/decoders/sequence_labeler.py

-            self.encoder.input_sequence.temporal_states, 2)
-        dweights_4d = tf.expand_dims(
-            tf.expand_dims(self.decoding_residual_w, 0), 0)
+        if hasattr(self.encoder, "input_sequence"):


Co když je self.encoder dekodér - nezajímají nás embeddingy slov, co lezou na vstup dekodéru? Asi by to bylo už na větší refactor..

Ja bych spis cekal, ze te spis zajimaji skryte stavy (resp. to, na zaklade ceho potom predikujes vystupy).

Když ten labeler pracuje nad enkodérem, tak kouká jak na stavy, tak na vstupy. A tadyten if je tu proto, že u dekodéru to padalo, protože nemá vstupy. Ale on je taky svym způsobem má. Plus je samozřejmě ošklivý if přes hasattr

Mohly by nás zajímat, ale znamenalo by to docela dost překopat autoregressive decoder a do toho se mi nechce pouštět. Navíc by to potom nebyla input sequence a bylo potřeba nějak sjednotit interface věcí, co mají embeddingy.

cifkao · 2018-03-03T08:06:38Z

Jen nápad: Co přidat třídu DecoderSequence (nebo by to mohla být funkce get_decoder_states, která by vracela nějaký TemporalStateful), které by se řeklo, co přesně se z dekodéru chce (stavy, vstupy, logity...) a v jakém režimu?

(Nejlepší by bylo mít to jako metodu dekodéru, ale ta nejde zavolat z konfiguráku.)

jindrahelcl · 2018-03-03T16:16:59Z

Já bych přidal nějakou kouzelnou věc, která by z tenzorů v čase (resp prostoru) udělala temporal (resp spatial) stateful objekty.
fieldům se z konfiguráku přistupovat dá.

Takže bychom měli v dekodéru normálně:

@tensor
def runtime_logits(self) -> tf.Tensor:
    #...

pak někde v nějakym helper modulu něco jako

def make_temporal_stateful(tensor: tf.Tensor) -> TemporalStateful:
    #...

a v konfigu:

[decoder_logits]
class=make_temporal_stateful
tensor=<decoder.runtime_logits>

No a nebo by se ten make_stateful volal neviditelně z konstruktoru toho objektu, kterej by žral Union[tf.Tensor, TemporalStateful] a pokud by to byl tensor, tak by to tu make funkci zavolalo samo.

cifkao · 2018-03-04T12:26:49Z

@jindrahelcl TemporalStateful potřebuje ještě masku (i když ta asi defaultně může být None nebo jedničková). A měl by to asi navíc být ModelPart se závislostí na tom model partu, odkud je ten tenzor.

Musel by se přidat nějaký TemporalStatefulAdapter nebo DefaultTemporalStateful, který by vyžadoval temporal_states, temporal_mask a dependencies. Dekodér by vyrobil instance téhle třídy (s dependencies=[self]) a vystavil jako properties, na které by se dalo odkazovat z konfiguráku. Ale to je trochu ošklivé.

Navíc možná chceme mít možnost, aby to bylo závislé na tom train modu, ale to já neumím posoudit.

jlibovicky · 2018-03-04T21:50:11Z

Kouzelná věc mi přijde zbytečně složitá, pokud nemáme jinej usecase, než tohle.

jlibovicky · 2018-03-05T10:02:03Z

mypy mi říká:

neuralmonkey/decoders/transformer.py:110: error: Property "dimension" defined in "TransformerDecoder" is read-only

Nevíte někdo, co to je?

jindrahelcl · 2018-03-05T11:01:40Z

dimension je poděděná @property z TemporalStateful, takže je read-only,

jindrahelcl · 2018-03-05T11:02:35Z

Kdežto tady se dimension značí dimenze modelu, jak o ní mluví Vaswani et al

jlibovicky · 2018-03-05T11:16:01Z

Aha, tak přejmenujeme to u transormera na model_dimension?

jindrahelcl · 2018-03-05T11:32:45Z

Můžeme Dne 5. 3. 2018 12:16 odp. napsal uživatel "Jindřich Libovický" < [email protected]>:

…

Aha, tak přejmenujeme to u transormera na model_dimension? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwcs_0ilt9qcCXXkmkA2Je4vtYEIl9tks5tbR5ygaJpZM4SZqtw> .

jindrahelcl

typo, plus bych přidal do testů případ, kdy je labeler stacklej na autoreg. dekodéru.

jindrahelcl · 2018-03-08T23:03:32Z

neuralmonkey/decoders/sequence_labeler.py

+
+    Note that when the labeler is stacked on an autoregressive decoder, it
+    labels the symbol that is currently generated by the decoder, i.e., the
+    decoder's state has not yet been updated by putting the decoded symbol on


jindrahelcl · 2018-03-08T23:03:49Z

neuralmonkey/decoders/sequence_labeler.py

+    Note that when the labeler is stacked on an autoregressive decoder, it
+    labels the symbol that is currently generated by the decoder, i.e., the
+    decoder's state has not yet been updated by putting the decoded symbol on
+    its input. The label is thus the label of a symbol is generated, not the


nesrozumitelná věta

(ta co začíná The label)

jlibovicky · 2018-03-13T09:51:07Z

Tak, když jsem přidal pořádnej tests, tak to padá na tom, že za runtimu není nafeedovaná train_inputs v dekodéru. Ale upřímně nevím, kde se tam berou, protože mi tf.cond v autoregresivním dekodéru mi přijde v pořádku.

jindrahelcl · 2018-03-13T09:53:25Z

Zkus TF 1.5 on ten cond to tam může propouštět i když je runtime.. ale kvůli tomu tam jsou ty lambdy, ne?

jindrahelcl · 2018-03-13T09:53:44Z

ok tak TF 1.6 :-)

jindrahelcl · 2018-07-02T11:23:11Z

@jlibovicky tohle je taková kravina, že by se dala rebasnout a mergnout, ne?

jlibovicky · 2018-07-02T15:30:09Z

Zkus to, ale já mylslim, že to na něčem padalo.

jindrahelcl · 2018-07-11T10:20:30Z

neuralmonkey/decoders/autoregressive.py

+    @tensor
+    def temporal_states(self) -> tf.Tensor:
+        # strip the last symbol which is </s>
+        return tf.cond(


Tohle takhle nefunguje. tf.cond potřebuje všechny placeholdery, i když se nakonec vyhodnotí opačně.

Jo, to už jsem jednou věděl. Myslím, že se to mělo spravit v nějaké další verzi TF. Jak jsme na tom s kompatibilitou s novými verzemi?

varisd

Nejake pripominky

varisd · 2018-08-01T19:09:54Z

neuralmonkey/decoders/sequence_labeler.py

-        return tf.reduce_sum(weighted_loss)
+        min_time = tf.minimum(tf.shape(self.train_targets)[1],
+                              tf.shape(self.logits)[1])
+


Skutecne tu chceme vybirat minimum behem trenovani. Nehrozi treba, ze se sit nauci spravne generovat (dekoderovou vrstvou) napr. pouze sekvence delky 1, ktere ale bude spravne labelovat.

Nemely by se logits "paddovat" na train_targets length, nebo nejakym jinym zpusobem penalizovat kratsi sekvence?

varisd · 2018-08-01T19:11:59Z

neuralmonkey/decoders/sequence_labeler.py

+            logits=self.logits[:, :min_time],
+            targets=self.train_targets[:, :min_time],
+            weights=self.input_sequence.temporal_mask[:, :min_time])
+        # pylint: enable=unsubscriptable-object


Je skutecne v poradku tady nahradit sumu pres logity meanem? Minimalne z hlediska zpetne kompatibility (trenovacich hyperparametru) to uplne koser nebude.

varisd · 2018-08-01T19:26:38Z

neuralmonkey/decoders/transformer.py

+        # of mypy not being able to handle the tf.Tensor type.
+        assert self.encoder_states is not None
+
+        self.model_dimension = (


Nebylo by lepsi misto prejmenovavani radej presunout radky 130-136 do overridnute property dimension? Takhle vznika zmatek v kodu

jlibovicky requested a review from jindrahelcl March 2, 2018 10:06

jlibovicky self-assigned this Mar 2, 2018

jlibovicky added the enhancement label Mar 2, 2018

varisd requested changes Mar 2, 2018

View reviewed changes

jindrahelcl requested changes Mar 2, 2018

View reviewed changes

jlibovicky force-pushed the decoder_stateful branch 2 times, most recently from 4ff79ac to d665a6f Compare March 8, 2018 15:40

jindrahelcl requested changes Mar 8, 2018

View reviewed changes

jlibovicky force-pushed the decoder_stateful branch from d665a6f to bdd7cf3 Compare March 9, 2018 09:54

jlibovicky added 7 commits July 11, 2018 11:52

make autoregressive decoder temporal stateful

855c2dd

make the seq. labeler no assume encoder has an input sequence

5cda779

make the names in labeler more general

feb6b3a

add comment for computing loss

6ce64e8

fix mypy

8918dd4

rename transformer dimension so it does not collide

cb426d6

document labeling autoregressive decoder

9990b19

jlibovicky added 3 commits July 11, 2018 12:04

multi-task learning with labeler to tests

2c0b915

add runtime labeling to tests

bc0caeb

fix </s> striping

d0da72f

jindrahelcl force-pushed the decoder_stateful branch from f535288 to d0da72f Compare July 11, 2018 10:04

jindrahelcl requested changes Jul 11, 2018

View reviewed changes

varisd requested changes Aug 1, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoder stateful #663

Decoder stateful #663

jlibovicky commented Mar 2, 2018 •

edited

Loading

varisd left a comment

varisd Mar 2, 2018

jindrahelcl left a comment

jindrahelcl Mar 2, 2018

jlibovicky Mar 4, 2018

jindrahelcl Mar 7, 2018

jindrahelcl Mar 2, 2018

varisd Mar 2, 2018

jindrahelcl Mar 2, 2018

jlibovicky Mar 4, 2018

cifkao commented Mar 3, 2018

jindrahelcl commented Mar 3, 2018 •

edited

Loading

cifkao commented Mar 4, 2018 •

edited

Loading

jlibovicky commented Mar 4, 2018

jlibovicky commented Mar 5, 2018

jindrahelcl commented Mar 5, 2018

jindrahelcl commented Mar 5, 2018

jlibovicky commented Mar 5, 2018

jindrahelcl commented Mar 5, 2018 via email

jindrahelcl left a comment

jindrahelcl Mar 8, 2018

jindrahelcl Mar 8, 2018

jindrahelcl Mar 8, 2018

jlibovicky commented Mar 13, 2018

jindrahelcl commented Mar 13, 2018

jindrahelcl commented Mar 13, 2018

jindrahelcl commented Jul 2, 2018

jlibovicky commented Jul 2, 2018

jindrahelcl Jul 11, 2018

jlibovicky Jul 11, 2018

varisd left a comment

varisd Aug 1, 2018

varisd Aug 1, 2018

varisd Aug 1, 2018

Decoder stateful #663

Are you sure you want to change the base?

Decoder stateful #663

Conversation

jlibovicky commented Mar 2, 2018 • edited Loading

varisd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jindrahelcl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cifkao commented Mar 3, 2018

jindrahelcl commented Mar 3, 2018 • edited Loading

cifkao commented Mar 4, 2018 • edited Loading

jlibovicky commented Mar 4, 2018

jlibovicky commented Mar 5, 2018

jindrahelcl commented Mar 5, 2018

jindrahelcl commented Mar 5, 2018

jlibovicky commented Mar 5, 2018

jindrahelcl commented Mar 5, 2018 via email

jindrahelcl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlibovicky commented Mar 13, 2018

jindrahelcl commented Mar 13, 2018

jindrahelcl commented Mar 13, 2018

jindrahelcl commented Jul 2, 2018

jlibovicky commented Jul 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varisd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlibovicky commented Mar 2, 2018 •

edited

Loading

jindrahelcl commented Mar 3, 2018 •

edited

Loading

cifkao commented Mar 4, 2018 •

edited

Loading